The following data is on New Orleans tornado building damage during December 2022. This data was obtained from Verisk Analytics and it was derived computer vision and machine learning using post-catastrophe aerial imagry data. There are approximately 42,000 buildings in this dataset.
Here is some general information on what each type of score means in the context of tornado roof damage:
Here are some interactive before and after aerial images that were taken:
This is an example of a building that has a catastrophe score of 100 (FEMA 6 / Destroyed)
This is another example of a building that has a catastrophe score of 100 (FEMA 6 / Destroyed)
This is an example of a building that has a catastrophe score of approximately 60 (FEMA 4 / Major)
I converted roof_solar into a T/F statement, by converting “SOLAR PANEL” to TRUE and “NO SOLAR PANEL” to FALSE. In addition to this, I converted the roof shapes that the computer wasn’t very sure about (up to a 20% chance of being incorrect) into NA. There were some cells in damage_level where they were filled with an empty character, so I converted that into NA as well. I then separated longitude and latitude so that it could be easily read into leaflet.
df <- read.csv("clean_data.csv") %>%
janitor::clean_names() %>%
mutate(roofsolar = case_when(roofsolar == "SOLAR PANEL" ~ TRUE)) %>%
mutate(roofshape = ifelse(roofshascr < 0.80, NA, roofshape)) %>%
select(-c(roofshascr, roofcondit_discolordetect, roofcondit_discolorscore, roofcondit_discolorpercen, trampscr, roofcondit_tarppercen))
df$rooftopgeo <- gsub("POINT \\(|\\)", "", df$rooftopgeo)
df <- df %>%
separate(rooftopgeo, into = c("long", "lat"), sep = " ", convert = TRUE)
df$damage_level <- ifelse(df$damage_level == "", NA, df$damage_level)
df$roofshape <- factor(df$roofshape, levels = c("gable", "hip", "flat"))
levels_roofmateri <- c("metal", "shingle", "membrane", "shake", "tile")
df$roofmateri <- factor(df$roofmateri, levels = c("gravel", levels_roofmateri))
df$roofmateri <- factor(df$roofmateri, levels = levels_roofmateri)
Catastrophe score is an aggregate of missing material, structural damage, and a few other attributes.
Typically, insurance agencies consider gable roof shapes as more prone to damage than hip roof shapes. This graph illustrates the proportion of roofs of a certain shape:
In addition to this, shingle roofs are more easily damaged than metal or tile roofs, which is important to keep in mind because shingle roofs are used extremely often in the states. The graph below illustrates the proportion of roofs made with a certain material (shingle roofs clearly are most prominent):
Catastrophe scores are separated based on the summary of the data set, excluding the catastrophe scores of 0 (which is shown in the Models section):
mostdamage <- df %>% filter(catastrophescore >= 50)
nodamage <- df %>% filter(catastrophescore == 0)
decimated <-df %>% filter(catastrophescore == 100)
middamage <- df %>% filter(catastrophescore < 50 & catastrophescore >= 15)
leastdamage <- df %>% filter(catastrophescore < 15 & catastrophescore >= 2)
minimaldamage <- df %>% filter(catastrophescore == 1)
NOTE: Red indicates the buildings that were the most damaged (catastrophe score >= 50), orange indicates (25 < catastrophe score < 50), blue indicates (catastrophe score <= 25, excluding scores of 0). Only 3852 buildings experienced a nonzero catastrophe score, so the majority of the buildings (37,967) exhibited a catastrophe score of 0, which is shown in gray.
This shows all of the catastrophe scores, the vast majority of roofs have no damage, which is denoted in gray.
Map of the buildings that experienced no damage. These are all of the roofs that have a catastrophe score of 0:
Map of the buildings that experienced any form of damage:
NOTE: Red indicates the buildings that were the most damaged and indicates a catastrophe score above or equal to 50, orange indicates a catastrophe score more than 25 and less than 50, blue indicates a catastrophe score at or below 25.
Let’s break down the damage:
Map of the buildings that experienced the least damage. The catastrophe scores seen here are more than or equal to 2 and less than 15:
Map of the buildings that experienced mid catastrophe scores. These catastrophe scores vary from a score of 15 or above to a score less than 50:
Map of the buildings that experienced the most damage. These buildings have catastrophe scores at or above 50: (Click the points!)
Map of the buildings that were completely destroyed. These have catastrophe scores of 100: (Click on the points here too!)
Since most of the buildings in this dataset were not damaged by a tornado, the summary of the catastrophe scores of each building is skewed. This can be seen below:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 2.217 0.000 100.000
Due to this, I made models that excluded the catastrophe scores of 0 to just look into the structures that experienced damage. Below is the summary for the structures that exhibited damage:
In addition to this, the models I have made on the data could not have been able to entirely describe the data because the data was derived using computer vision machine learning from aerial imagery data. The computer vision models themselves have inherent error rates sometimes as height as 30% or 40%. Tornadoes are inherently chaotic such that they have a tendency to bounce around, which leads to seemingly random interactions with other substrates.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 15.00 28.64 46.00 100.00
Models
## # Comparison of Model Performance Indices
##
## Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 | RMSE | Sigma
## ---------------------------------------------------------------------------------------------
## mods1 | glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods2 | glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods3 | glm | 26094.1 (>.999) | 26094.1 (>.999) | 26147.3 (>.999) | 0.147 | 27.816 | 27.857
## mods4 | glm | 30925.4 (<.001) | 30925.5 (<.001) | 30968.0 (<.001) | 0.156 | 28.795 | 28.821
## mods5 | glm | 30773.9 (<.001) | 30773.9 (<.001) | 30828.6 (<.001) | 0.176 | 28.402 | 28.437
Out of the models I made, Model 3 appeared to work best. Though it should be noted that none of these models fit particularly well based on the variables used.
Model 3
check_model(mods3, type = "pearson")
Correlation plot between variables using Pearson correlation coefficient
theme(text = element_text(size = 10))
## List of 1
## $ text:List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : num 10
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
summary(mods5)
##
## Call:
## glm(formula = catastrophescore ~ long + roofmateri + rooftree +
## enclosure, family = gaussian(link = "identity"), data = extra)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -75.62 -18.17 -8.69 13.16 82.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4460.13497 1149.62714 3.880 0.000107 ***
## long 49.03688 12.76374 3.842 0.000124 ***
## roofmaterishingle -23.29005 1.43088 -16.277 < 2e-16 ***
## roofmaterimembrane 12.42319 2.18262 5.692 1.37e-08 ***
## roofmaterishake -21.90493 4.73851 -4.623 3.94e-06 ***
## roofmateritile -22.84290 7.71403 -2.961 0.003087 **
## rooftree 0.56774 0.06876 8.257 < 2e-16 ***
## enclosureTRUE 44.80193 10.78228 4.155 3.34e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 808.6766)
##
## Null deviance: 3157376 on 3226 degrees of freedom
## Residual deviance: 2603130 on 3219 degrees of freedom
## (10 observations deleted due to missingness)
## AIC: 30774
##
## Number of Fisher Scoring iterations: 2
vif(mods5)
## GVIF Df GVIF^(1/(2*Df))
## long 1.010340 1 1.005157
## roofmateri 1.020779 4 1.002574
## rooftree 1.012753 1 1.006356
## enclosure 1.004155 1 1.002076
Root mean squared error for Model 3
## [1] 27.81604
Root median squared error for Model 3
## [1] 17.34827
Based on Model 3, I have made model predictions:
Here is a comparison of the predicted vs the actual catastrophe score:
I then plotted the predicted catastrophe scores alongside the actual catastrophe scores for reference.
The variables included in this dataset were shown to not be entirely helpful in predicting catastrophe scores accurately, which is exemplified in the graph above. More information would need to be considered, specifically, taking a look into tornadoes.